Malaria Detection¶

Executive Summary ¶

Key takeaways

  • This project proposes a Computer Vision based Machine Learning model that helps to speed up the identification of malaria infected people early so that they can be treated to avoid hospitalizations or even death.
  • We use the cell images datasets that are made available to us, for training the model and to validate it's performance
  • We built various models using CNN, and the final model we chose uses augmented cell images along with their original images, which detects infected cells with a very high degree of accuracy

Next Steps

  • This model performs very well with high level of accuracy, and is ready for deploying into various geographies
  • If we can afford additional time and resources, we can continue investigating more variations of models that may produce even better results, for e.g, using HSV images to train our final model, transfer learning using VGG19 or ResNet models etc.
  • While this model is up and running, we can continue to collect additional samples from the field, that can be used for further training of our model to fine tune it
  • When and if we come up with a significantly better performing model, we can rollout the updated model

Problem Definition¶

The context: Why is this problem important to solve?

  • Malaria is a contagious disease caused by Plasmodium parasites that are transmitted to humans through mosquitoes
  • The lethal parasites can stay alive for more than a year in a human body without showing any symptoms
  • Almost 50% of the world's population is in danger from Malaria.
  • There were 229 million malaria cases and 400,000 malaria related deaths reported throughout the world in 2019 alone
  • Manual inspection of red blood cells by experienced professionals to discriminate between healthy and infected cells is a tedious and time-consuming process. Further the accuracy of such diagnostics can be adversely impacted by inter-observer variability

The objectives: What is the intended goal?

  • Our goal is to identify malaria infected people early on, and without the need of human inspection, so that they can be treated in time to cure them and prevent hospitalizations or even deaths
  • Build an automated system with early and accurate detection of malaria
  • It should be able to identify whether an image of a given red cell is that of one infected with malaria or not
  • It should classify the image as parasitized or uninfected accordingly

The key questions: What are the key questions that need to be answered?

  • Whether the model that we develop can accurately identify parasitized and uninfected cells, and trustworthy?
  • What would be our recommendation to authorities on making use of this model?

The problem formulation: What is it that we are trying to solve using data science?

  • Using various data science techniques, we will explore the sample cells dataset
  • We will build an efficient Computer Vision model to classify those images as parasitized or uninfected
  • We will come up with recommendations for successfully deploying the model

Data Description ¶

There are a total of 24,958 train and 2,600 test images (colored) that we have taken from microscopic images. These images are of the following categories:

Parasitized: The parasitized cells contain the Plasmodium parasite which causes malaria
Uninfected: The uninfected cells are free of the Plasmodium parasites

Mount the Drive

In [1]:
# Mount Google drive so that this notebook can be used from Colab and load the dataset from my Google drive

from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Loading libraries¶

In [2]:
# Importing the required libraries
# The basic set is imported here, and we can import others as the need arises

import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
import zipfile
import io
from PIL import Image

# Suppress "will be deprecated in future" warning messages that we are ignoring knowingly
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

Let us load the data¶

The data has been downloaded as a zip file using the link provided on Olympus

The zip file has different folders for train and test data and contain different sizes of images for parasitized and uninfected cells within the respective folders.

The size of all images must be the same and should be converted to 4D arrays so that they can be used as an input for the convolutional neural network. Also, we need to create the labels for both types of images to be able to train and test the model.

Let's do the same for the training data first and then we will use the same code for the test data as well.

In [3]:
# Create a re-usable function to prepare image datasets
# We will read image files directly from the zip file
# Will read both parasitized and uninfected images under the path passed in
# Will return the images dataset in 4D, ready to use in our models
# And will return the labels in 1D

def prepare_images_dataset(zip_archive,                     # zip file that is already open for reading
                           file_path_prefix):               # path prefix for train / test datasets

   # initialize the datatsets to be returned
   X = []
   y = []

   # load the images
   for filename in zip_archive.namelist():
      if filename.startswith((file_path_prefix)):           # only process files under file_path_prefix directory
         if filename.endswith(('.png')):                    # ignore non-image files
            with zip_archive.open(filename) as image_file:

               # read the image and add to the list in correct format
               image = Image.open(io.BytesIO(image_file.read()))
               image = image.convert('RGB')                 # Convert them to RGB format
               resized_image = image.resize((64, 64))       # Resize images to a standard size, as images of different sizes can't be used in models
               npImage = np.array(resized_image)
               X.append(npImage)

               if filename.startswith((file_path_prefix + 'parasitized/')):
                  y.append(1)                               # the image is of a parasitized cell
               else:
                  y.append(0)                               # the image is of an uninfected cell

   # Convert the list to arrays
   X = np.array(X)
   y = np.array(y)

   return X, y                                              # these datasets are ready to be used in our models
In [4]:
# Load images and labels for both train and test datasets

# Use one of the two lines below to open the zip file with images, depending on if you are running the notebook locally or on Colab
# zip_archive = zipfile.ZipFile('cell_images.zip', 'r')
zip_archive = zipfile.ZipFile('/content/drive/MyDrive/mit-adsp/Capstone/cell_images.zip', 'r')

X_train, y_train = prepare_images_dataset(zip_archive, 'cell_images/train/')
X_test, y_test = prepare_images_dataset(zip_archive, 'cell_images/test/')

zip_archive.close()                                         # close the zip file, that we don't need it anymore
del zip_archive                                             # free-up the memory used by zip_archive

Check the shape of train and test images¶

In [5]:
# Let's printout the shapes for image datasets

print('Shape of training dataset :', X_train.shape)
print('Shape of test dataset :', X_test.shape)
Shape of training dataset : (24958, 64, 64, 3)
Shape of test dataset : (2600, 64, 64, 3)

Check the shape of train and test labels¶

In [6]:
# Printout the shpaes for lable arrays

print('Shape of training labels :', y_train.shape)
print('Shape of test labels :', y_test.shape)
Shape of training labels : (24958,)
Shape of test labels : (2600,)

Observations and insights:¶

  • Training set has 24958 images and test set has 2600 images
  • We have loaded these images into their corresponding datasets and created their respective label arrays
  • Images have been resized to be uniform and the dataset is set in 4D as expected by our models

Check the minimum and maximum range of pixel values for train and test images¶

In [7]:
# Print the range of pixel values, from minimum to maximum, for each dataset

print('The range (min - max) of pixel values for training images :', np.amin(X_train), ' - ', np.amax(X_train))
print('The range (min - max) of pixel values for test images :', np.amin(X_test), ' - ', np.amax(X_test))
The range (min - max) of pixel values for training images : 0  -  255
The range (min - max) of pixel values for test images : 0  -  255

Observations and insights:¶

  • Pixel values in both datasets range between 0 and 255, as they represent individual pixels of RGB images

Count the number of values in both uninfected and parasitized

In [8]:
# Get the count of uninfected and parasitized cell images in the training set

values, counts = np.unique(y_train, return_counts=True)
print('Training set: Number of uninfected  cell images :', counts[0])
print('Training set: Number of parasitized cell images :', counts[1])


# Get the count of uninfected and parasitized cell images in the test set

values, counts = np.unique(y_test, return_counts=True)
print('\nTesting set:  Number of uninfected  cell images :', counts[0])
print('Testing set:  Number of parasitized cell images :', counts[1])
Training set: Number of uninfected  cell images : 12376
Training set: Number of parasitized cell images : 12582

Testing set:  Number of uninfected  cell images : 1300
Testing set:  Number of parasitized cell images : 1300

Normalize the images

In [9]:
# As we have seen pixel values in the training and test dataset range between 0 and 255
# Since the training would be efficient when we deal with normalized data, let's normalize both train and test datasets

X_train_normalized = X_train.astype('float32') / 255.0
X_test_normalized = X_test.astype('float32') / 255.0

# Print out the range of values after normalization
print('The normalized range (min - max) of pixel values for training images :', np.amin(X_train_normalized), ' - ', np.amax(X_train_normalized))
print('The normalized range (min - max) of pixel values for test images :', np.amin(X_test_normalized), ' - ', np.amax(X_test_normalized))
The normalized range (min - max) of pixel values for training images : 0.0  -  1.0
The normalized range (min - max) of pixel values for test images : 0.0  -  1.0

Observations and insights:¶

  • Training and test datasets are (roughly) equally distributed between parasitized and uninfected cell images, with no bias
  • After normalization, pixel values range between 0 and 1 for all images, as expected

Plot to check if the data is balanced

In [10]:
# Let's use a histogram to visualize if parasitized and unaffected datasets are balanced, for both train and test datasets

plt.figure(figsize=(12, 6))

# plot the histograms side by side for both datasets, train dataset first
plt.subplot(1, 2, 1)
plt.hist(y_train, bins=2, rwidth=0.60 )
plt.title('Frequency Distribution of Training set')
plt.xticks([0, 1])
plt.xlabel('Parasitized?')
plt.ylabel('Image count')

# plot the histogram for test dataset
plt.subplot(1, 2, 2)
plt.hist(y_test, bins=2, rwidth=0.60 )
plt.title('Frequency Distribution of Test set')
plt.xticks([0, 1])
plt.xlabel('Parasitized?')
plt.ylabel('Image count')

plt.show()

Observations and insights:

  • As we have already seen, and from the histograms above, both training and test datasets are well balanced with almost equal number of parasitized and uninfected samples in both of them
  • So, bias in the dataset is not a concern for us

Data Exploration¶

Let's visualize the images from the train data

In [11]:
# Seed the random numbers first
import random
random.seed(9)
np.random.seed(9)
tf.random.set_seed(9)
In [12]:
# Create a function to display a random number of images from the given dataset

def displayImages(dataset,									# dataset of images to be used
								  rows,						# number of rows in the plot
								  cols,						# number of columns in the plot
								  figsize,					# Size of the figure
								  showLables):				# Whether to show labels or not

	labels = ['Uninfected', 'Parasitized']                  # labels to show for images

	tmp_array = np.arange(dataset.shape[0]-1)               # random index values to choose from, based on image count in training set
	images_to_show = np.random.choice(tmp_array, size = rows * cols, replace=False)

	fig = plt.figure(figsize=figsize)						# Defining the figure size to 12x6
	for i in range(rows):									# Iterate through rows first
		for j in range(cols):								# For each row, iterate through columns
			index = i * cols + j							# Calculating the index for the image to plot
			ax = fig.add_subplot(rows, cols, index + 1)		# Adding subplot
			ax.imshow(dataset[images_to_show[index], :],	# Plotting the image using cmap=gray
					cmap=plt.get_cmap('gray'))
			if (showLables):
				ax.set_title(labels[y_train[images_to_show[index]]] +	# Show the labels as the caption for images
					 ' [' + str(images_to_show[index]) + ']', fontsize=9)
			plt.xticks([])									# Remove un-necessary ticks on the X axis
			plt.yticks([])									# Remove un-necessary ticks on the y axis
	plt.show()												# Finally, display the plot
In [13]:
# Display a random 10 images from the training dataset along with their labels

displayImages(X_train_normalized, 2, 5, (12, 6), True)

Observations and insights:

  • Uninfected images look like a homogenous blob with a uniform color with no spots
  • In parasitiezed images, we can see spots of various shapes with different color and intensities

Visualize the images with subplot(6, 6) and figsize = (12, 12)

In [14]:
# Display a random 36 images in a 6x6 plot as requested from the training dataset along with their labels

displayImages(X_train_normalized, 6, 6, (12, 12), True)

Observations and insights:

  • We make use of the function to show a 6x6 subplot with figsize of (12, 12)
  • The plots are consistent where we can see spots in parasitized images and no spots in uninfected ones

Plotting the mean images for parasitized and uninfected

In [15]:
# Extract parasitied images into a separate set and visualize some random samples

X_train_parasitized = X_train_normalized[y_train == 1]

displayImages(X_train_parasitized, 3, 6, (12, 6), False)
In [16]:
# Extract uninfected images into a separate set and visualize some random samples

X_train_uninfected = X_train_normalized[y_train == 0]

displayImages(X_train_uninfected, 3, 6, (12, 6), False)
In [17]:
# Let's calculate the mean images for parasitized and uninfected sets from the training dataset

mean_parasitized_image = np.mean(X_train_parasitized, axis=0)
mean_uninfected_image = np.mean(X_train_uninfected, axis=0)

print(mean_parasitized_image.shape, mean_uninfected_image.shape)
(64, 64, 3) (64, 64, 3)

Mean image for parasitized

In [18]:
# display the mean parasitized image

plt.imshow(mean_parasitized_image, cmap=plt.get_cmap('gray'))
plt.show()

Mean image for uninfected

In [19]:
# display the mean uninfected image
plt.imshow(mean_uninfected_image, cmap=plt.get_cmap('gray'))
plt.show()

Check difference of images from the mean image

In [20]:
from sklearn.preprocessing import MinMaxScaler

def DisplayMeanAndDifferenceImages(sample_image, mean_image, image_type):
    diff_image = sample_image - mean_image

    # As the difference image has negative values due to subtraction, we need to rescale them between 0 and 1
    # Using Normalize in imshow does not seem to help, so explicitly rescaling the image
    scaler = MinMaxScaler(feature_range=(0, 1))
    diff_image_2d = diff_image.reshape(-1, 1)
    diff_image_scleaed_2d = scaler.fit_transform(diff_image_2d)
    diff_image_scaled = diff_image_scleaed_2d.reshape(diff_image.shape)

    fig, axs = plt.subplots(1, 3, figsize=(12, 6))
    axs[0].imshow(sample_image)
    axs[0].set_title('Sample Image - ' + image_type)
    axs[1].imshow(mean_image)
    axs[1].set_title('Mean Image - ' + image_type)
    axs[2].imshow(diff_image_scaled)
    axs[2].set_title('Difference Image - ' + image_type)
    plt.show()
In [21]:
# Display a random parasitized sample image and its difference image from its mean

img_to_display = np.random.randint(0, X_train_parasitized.shape[0])             # choose a random sample
DisplayMeanAndDifferenceImages(X_train_parasitized[img_to_display], mean_parasitized_image, 'Parasitized')
In [22]:
np.random.randint(0, X_train_parasitized.shape[0])
Out[22]:
3995
In [23]:
# Display a random uninfected sample image and its difference image from its mean

img_to_display = np.random.randint(0, X_train_uninfected.shape[0])             # choose a random sample
DisplayMeanAndDifferenceImages(X_train_uninfected[img_to_display], mean_uninfected_image, 'Uninfected')

Observations and insights:

  • There are no visible differences between the mean parasitized cell image and mean uninfected cell image
  • Corresponding mean subtracted images seem to only take away the prominence of infected spots rather than adding any value
  • So, we won't apply mean subtraction technique

Converting RGB to HSV of Images using OpenCV

Converting the train data
In [24]:
# Convert the training dataset images to HSV color space

import cv2

X_train_hsv = []

# convert the original un-normalized RGB images to HSV colorspace
for rgb_image in X_train:
   hsv_image = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2HSV)
   X_train_hsv.append(hsv_image)

X_train_hsv = np.array(X_train_hsv)

# normalize the hsv images so that we can utilize them in our models
X_train_normalized_hsv = X_train_hsv.astype('float32') / 255.0

print(X_train_normalized_hsv.shape)
(24958, 64, 64, 3)
In [25]:
# we will pick random 18 images from the dataset, and display their original RGB and processed images side by side
# so that we can compare them easily

def displayOriginalAndProcessedImages(rgb_images, processed_images):

   labels = ['Uninfected', 'Parasitized']

   tmp_array = np.arange(rgb_images.shape[0])
   images_to_show = np.random.choice(tmp_array, size=18, replace=False)

   fig = plt.figure(figsize=(12, 12))

   rows = 6                                                 # total of 36 images (18x2)
   cols = 6

   img_to_show_index = 0

   for i in range(18):

      index = i * 2 + 1

      # Show RGB image first
      ax = fig.add_subplot(rows, cols, index)
      ax.imshow(rgb_images[images_to_show[i], :], cmap=plt.get_cmap('gray'))
      ax.set_title(labels[y_train[images_to_show[i]]] + ' [orig-' + str(images_to_show[i]) + ']', fontsize=7)
      plt.xticks([])
      plt.yticks([])

      # Show HSV image next to it
      ax = fig.add_subplot(rows, cols, index + 1)
      ax.imshow(processed_images[images_to_show[i], :], cmap=plt.get_cmap('gray'))
      ax.set_title(labels[y_train[images_to_show[i]]] + ' [proc-' + str(images_to_show[i]) + ']', fontsize=7)
      plt.xticks([])
      plt.yticks([])

   plt.show()
In [26]:
# Compare a random set of RGB and their corresponging HSV images from the training dataset

displayOriginalAndProcessedImages(X_train_normalized, X_train_normalized_hsv)
Converting the test data¶
In [27]:
# Convert the test dataset images to HSV color space

X_test_hsv = []

for rgb_image in X_test:
   hsv_image = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2HSV)
   X_test_hsv.append(hsv_image)

X_test_hsv = np.array(X_test_hsv)

# normalize the hsv images so that we can utilize them in our models
X_test_normalized_hsv = X_test_hsv.astype('float32') / 255.0

print(X_test_normalized_hsv.shape)
(2600, 64, 64, 3)
In [28]:
# Compare a random set of RGB and their corresponging HSV images from the test dataset, visually

displayOriginalAndProcessedImages(X_test_normalized, X_test_normalized_hsv)
Observations and insights:
  • For a given image, the HSV color space seems to highlight the color and saturation variations better than the RGB versions
  • They can potentially improve the performance of the models.
  • We can use these HSV images for one of the variatons of the models that we will build

Processing Images using Gaussian Blurring

Gaussian Blurring on train data

In [29]:
# Let's build the Gausian blurred images for training set

X_train_normalized_blurred = []

for image in X_train_normalized:
   blurred_image = cv2.GaussianBlur(image, (5, 5), 0)
   X_train_normalized_blurred.append(blurred_image)

X_train_normalized_blurred = np.array(X_train_normalized_blurred)

print(X_train_normalized_blurred.shape)
(24958, 64, 64, 3)
In [30]:
# Compare a random set of RGB and their corresponging Gaussain blurred images from the training dataset, visually

displayOriginalAndProcessedImages(X_train_normalized, X_train_normalized_blurred)

Gaussian Blurring on test data

In [31]:
# Let's build the Gausian blurred images for test set

X_test_normalized_blurred = []

for image in X_test_normalized:
   blurred_image = cv2.GaussianBlur(image, (5, 5), 0)
   X_test_normalized_blurred.append(blurred_image)

X_test_normalized_blurred = np.array(X_test_normalized_blurred)

print(X_test_normalized_blurred.shape)
(2600, 64, 64, 3)
In [32]:
# Compare a random set of RGB and their corresponging Gaussain blurred images from the test dataset, visually

displayOriginalAndProcessedImages(X_test_normalized, X_test_normalized_blurred)

Observations and insights:¶

  • As expected, the images that have gone through the Gaussian filter are blurred than the original images
  • But there is no significant / identifiable difference between the original and blurred images

Think About It: Would blurring help us for this problem statement in any way? What else can we try?

  • As we don't see any significant visual differences between the original and blurred images, we don't see any benefit of using the Gaussian blurring technique
  • As the original images themselves did not have any sharp edges etc., and are blurred to an extent, further blurring using Gaussian filter does not seem to help in improving our model performance. So, we may not consider using them when we test out various model variations.

Model Building¶

Base Model¶

Note: The Base Model should be fully built and evaluated with all outputs shown to give an idea about the process of the creation and evaluation of the performance of a CNN architecture. A similar process can be followed in iterating to build better-performing CNN architectures.

Importing the required libraries for building and training our Model

In [33]:
# Import libararies needed for ANN
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, BatchNormalization, LeakyReLU
from keras.optimizers import Adam

One Hot Encoding the train and test labels

In [34]:
# we need to one-hot encode labels so that we can use them in the ANN/CNN models

y_train_encoded = tf.keras.utils.to_categorical(y_train)
y_test_encoded = tf.keras.utils.to_categorical(y_test)

Building the model

In [35]:
# Re-init random numbers
random.seed(3)
np.random.seed(3)
tf.random.set_seed(3)
In [36]:
X_train_normalized.shape
Out[36]:
(24958, 64, 64, 3)
In [37]:
# Create a base CNN model just to showcase the process involved in creating and evaluating the performance of a CNN model

model_base = Sequential()
model_base.add(Conv2D(16, (3, 3), padding="same", activation='relu', input_shape=(64, 64, 3)))  # input images are of dimension (64, 64, 3)
model_base.add(MaxPooling2D(pool_size=(2, 2)))
model_base.add(Flatten())
model_base.add(Dense(32, activation='relu'))
model_base.add(Dense(2, activation='softmax'))

Compiling the model

In [38]:
# Compile the model, using Adam optimizer and accuracy for metrics

model_base.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])

Using Callbacks

In [39]:
# Let's define a callback to enable early stopping, if no improvement between successive epochs

early_stopping_cb = tf.keras.callbacks.EarlyStopping(
   monitor='val_accuracy', min_delta=0.05, patience=3, verbose=1, restore_best_weights=False)

Fit and train our Model

In [40]:
# Train the model using trainiing set of normalized images with the early stopping callback
# Let's use 20% of data for validation in each epoch

model_history = model_base.fit(X_train_normalized, y_train_encoded, validation_split=0.2, batch_size=32, verbose=1,
   epochs=50, callbacks=[early_stopping_cb])
Epoch 1/50
624/624 [==============================] - 6s 5ms/step - loss: 0.5586 - accuracy: 0.7160 - val_loss: 0.7892 - val_accuracy: 0.6062
Epoch 2/50
624/624 [==============================] - 3s 4ms/step - loss: 0.3633 - accuracy: 0.8573 - val_loss: 0.5340 - val_accuracy: 0.7919
Epoch 3/50
624/624 [==============================] - 2s 4ms/step - loss: 0.2364 - accuracy: 0.9123 - val_loss: 0.6392 - val_accuracy: 0.7582
Epoch 4/50
624/624 [==============================] - 2s 4ms/step - loss: 0.1802 - accuracy: 0.9342 - val_loss: 0.3798 - val_accuracy: 0.8486
Epoch 5/50
624/624 [==============================] - 2s 4ms/step - loss: 0.1533 - accuracy: 0.9441 - val_loss: 0.4224 - val_accuracy: 0.8566
Epoch 6/50
624/624 [==============================] - 2s 4ms/step - loss: 0.1269 - accuracy: 0.9557 - val_loss: 0.3219 - val_accuracy: 0.8844
Epoch 7/50
624/624 [==============================] - 3s 4ms/step - loss: 0.1179 - accuracy: 0.9586 - val_loss: 0.2723 - val_accuracy: 0.9141
Epoch 8/50
624/624 [==============================] - 2s 4ms/step - loss: 0.0875 - accuracy: 0.9708 - val_loss: 0.3540 - val_accuracy: 0.8802
Epoch 9/50
624/624 [==============================] - 2s 4ms/step - loss: 0.0739 - accuracy: 0.9735 - val_loss: 0.2889 - val_accuracy: 0.9002
Epoch 10/50
624/624 [==============================] - 2s 4ms/step - loss: 0.0605 - accuracy: 0.9790 - val_loss: 0.7237 - val_accuracy: 0.7943
Epoch 10: early stopping

Evaluating the model on test data

In [41]:
# Let's evaluate the model for accuracy against test dataset

test_accuracy = model_base.evaluate(X_test_normalized, y_test_encoded, verbose=2)
82/82 - 0s - loss: 0.4066 - accuracy: 0.8692 - 319ms/epoch - 4ms/step

Classification report & plotting the confusion matrix

In [42]:
# Predict the labels for the test dataset

y_pred = model_base.predict(X_test_normalized)              # returns prediction probabilities for each label

# Convert the prediction probablities back to single labels

y_pred_labels = np.argmax(y_pred, axis=1)
y_test_labels = np.argmax(y_test_encoded, axis=1)
82/82 [==============================] - 0s 2ms/step
In [43]:
# Print the Classification report

from sklearn.metrics import classification_report
metrics = classification_report(y_test_labels, y_pred_labels)
print(metrics)
              precision    recall  f1-score   support

           0       0.95      0.78      0.86      1300
           1       0.81      0.96      0.88      1300

    accuracy                           0.87      2600
   macro avg       0.88      0.87      0.87      2600
weighted avg       0.88      0.87      0.87      2600

In [44]:
# Print the Confusion Matrix for test predictions using base CNN model

confusion_matrix = tf.math.confusion_matrix(y_test_labels, y_pred_labels)
_, ax = plt.subplots(figsize=(5, 4))
sns.heatmap(confusion_matrix, annot=True, linewidths=.4, fmt="d", square=True, ax=ax)
plt.show()

Record statistics of base model performance¶

In [45]:
# Collect the performance metrics into a dataframe for a later comparison of various models

from sklearn.metrics import precision_recall_fscore_support

model_performances = pd.DataFrame(columns=['Model', 'Accuracy',
                                           'Precision Parasitized', 'Precision Uninfected',
                                           'Recall Parasitized', 'Recall Uninfected',
                                           'F1 Score Parasitized', 'F1 Score Uninfected'])

precision, recall, f1_score, _ = precision_recall_fscore_support(y_test_labels, y_pred_labels)
accuracy = float(metrics.split()[-2])

model_performances = model_performances.append({
   'Model'                 : 'Base CNN Model',
   'Accuracy'              : accuracy,
   'Precision Parasitized' : round(precision[1], 2),
   'Precision Uninfected'  : round(precision[0], 2),
   'Recall Parasitized'    : round(recall[1], 2),
   'Recall Uninfected'     : round(recall[0], 2),
   'F1 Score Parasitized'  : round(f1_score[1], 2),
   'F1 Score Uninfected'   : round(f1_score[0], 2)
}, ignore_index=True)
In [46]:
# Check that we are capturing the performance metrics properly

model_performances
Out[46]:
Model Accuracy Precision Parasitized Precision Uninfected Recall Parasitized Recall Uninfected F1 Score Parasitized F1 Score Uninfected
0 Base CNN Model 0.87 0.81 0.95 0.96 0.78 0.88 0.86

Plotting the train and validation curves

In [47]:
# Plot the accuracy of training and validation sets from the model_history that is passed in

def plotAccuracy(model_history, model_name):
   plt.plot(model_history.history['accuracy'])
   plt.plot(model_history.history['val_accuracy'])
   plt.title('Accuracy of : ' + model_name)
   plt.ylabel('Accuracy')
   plt.xlabel('Epoch Count')
   plt.legend(['Training Set Performance', 'Validation Set Performance'], loc='lower right')
   plt.show()
In [48]:
# Plot the accuracy of training and validation sets for the base CNN model

plotAccuracy(model_history, 'Base CNN Model')

So now let's try to build another model with few more add on layers and try to check if we can try to improve the model. Therefore try to build a model by adding few layers if required and altering the activation functions.

Model 1 with more layers¶

Trying to improve the performance of our model by adding new layers

In [49]:
# Clear Keras' backend to clear the history of previous models

from keras import backend as keras_backend
keras_backend.clear_session()

# Re-init random numbers
random.seed(6)
np.random.seed(6)
tf.random.set_seed(6)

Building the Model¶

In [50]:
# Add additional layers of Conv2D with LeakyRelu and Dense(16) layers

model_more_layers = Sequential()
model_more_layers.add(Conv2D(16, (3, 3), padding="same", activation='relu', input_shape=(64, 64, 3)))
model_more_layers.add(Conv2D(32, (3, 3), padding="same"))
model_more_layers.add(LeakyReLU(alpha=0.1))
model_more_layers.add(MaxPooling2D(pool_size=(2, 2)))
model_more_layers.add(Flatten())
model_more_layers.add(Dense(32, activation='relu'))
model_more_layers.add(Dense(16, activation='relu'))
model_more_layers.add(Dense(2, activation='softmax'))

Compiling the model

In [51]:
# Compile the CNN model with more layers

model_more_layers.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])

Using Callbacks

In [52]:
# Let's define a callback to enable early stopping, if no improvement between successive epochs

early_stopping_cb = tf.keras.callbacks.EarlyStopping(
   monitor='val_accuracy', min_delta=0.05, patience=5, verbose=1, restore_best_weights=True)

Fit and Train the model

In [53]:
# Train the model with more layers using training dataset and early stopping callback

model_history = model_more_layers.fit(X_train_normalized, y_train_encoded, validation_split=0.2, batch_size=32, verbose=1,
   epochs=50, callbacks=[early_stopping_cb])
Epoch 1/50
624/624 [==============================] - 7s 7ms/step - loss: 0.4835 - accuracy: 0.7569 - val_loss: 0.6069 - val_accuracy: 0.6879
Epoch 2/50
624/624 [==============================] - 4s 6ms/step - loss: 0.1972 - accuracy: 0.9238 - val_loss: 0.4102 - val_accuracy: 0.8834
Epoch 3/50
624/624 [==============================] - 4s 6ms/step - loss: 0.1491 - accuracy: 0.9455 - val_loss: 0.3377 - val_accuracy: 0.8704
Epoch 4/50
624/624 [==============================] - 4s 6ms/step - loss: 0.1253 - accuracy: 0.9540 - val_loss: 0.1671 - val_accuracy: 0.9223
Epoch 5/50
624/624 [==============================] - 4s 6ms/step - loss: 0.1230 - accuracy: 0.9538 - val_loss: 0.1907 - val_accuracy: 0.9287
Epoch 6/50
624/624 [==============================] - 4s 6ms/step - loss: 0.0901 - accuracy: 0.9666 - val_loss: 0.1936 - val_accuracy: 0.9411
Epoch 7/50
624/624 [==============================] - 4s 6ms/step - loss: 0.0746 - accuracy: 0.9723 - val_loss: 0.1393 - val_accuracy: 0.9531
Epoch 8/50
624/624 [==============================] - 4s 6ms/step - loss: 0.0638 - accuracy: 0.9764 - val_loss: 0.2543 - val_accuracy: 0.9000
Epoch 9/50
624/624 [==============================] - 4s 6ms/step - loss: 0.0537 - accuracy: 0.9801 - val_loss: 0.3777 - val_accuracy: 0.9281
Epoch 10/50
624/624 [==============================] - 4s 6ms/step - loss: 0.0360 - accuracy: 0.9867 - val_loss: 0.1612 - val_accuracy: 0.9601
Epoch 11/50
623/624 [============================>.] - ETA: 0s - loss: 0.0366 - accuracy: 0.9874Restoring model weights from the end of the best epoch: 6.
624/624 [==============================] - 4s 6ms/step - loss: 0.0367 - accuracy: 0.9874 - val_loss: 0.2140 - val_accuracy: 0.9447
Epoch 11: early stopping

Evaluating the model

In [54]:
# Let's evaluate the model with more layers for accuracy against test dataset

test_accuracy = model_more_layers.evaluate(X_test_normalized, y_test_encoded, verbose=2)
82/82 - 0s - loss: 0.1826 - accuracy: 0.9396 - 391ms/epoch - 5ms/step

Plotting the confusion matrix

In [55]:
# Predict the labels for the test dataset
y_pred = model_more_layers.predict(X_test_normalized)        # returns prediction probabilities for each label

# Convert the prediction probablities back to single labels
y_pred_labels = np.argmax(y_pred, axis=1)
y_test_labels = np.argmax(y_test_encoded, axis=1)
82/82 [==============================] - 0s 2ms/step
In [56]:
# Print the Classification report
metrics = classification_report(y_test_labels, y_pred_labels)
print(metrics)
              precision    recall  f1-score   support

           0       0.93      0.95      0.94      1300
           1       0.95      0.93      0.94      1300

    accuracy                           0.94      2600
   macro avg       0.94      0.94      0.94      2600
weighted avg       0.94      0.94      0.94      2600

In [57]:
# Print the Confusion Matrix for test predictions

confusion_matrix = tf.math.confusion_matrix(y_test_labels, y_pred_labels)
_, ax = plt.subplots(figsize=(5, 4))
sns.heatmap(confusion_matrix, annot=True, linewidths=.4, fmt="d", square=True, ax=ax)
plt.show()

Plotting the train and the validation curves

In [58]:
# Plot the accuracy of training and validation sets for the model with more layers

plotAccuracy(model_history, 'Model with more layers')

Record metrics for later comparison¶

In [59]:
# Collect the performance metrics for a later comparison of various models

precision, recall, f1_score, _ = precision_recall_fscore_support(y_test_labels, y_pred_labels)
accuracy = float(metrics.split()[-2])

model_performances = model_performances.append({
   'Model'                 : 'Using more layers',
   'Accuracy'              : accuracy,
   'Precision Parasitized' : round(precision[1], 2),
   'Precision Uninfected'  : round(precision[0], 2),
   'Recall Parasitized'    : round(recall[1], 2),
   'Recall Uninfected'     : round(recall[0], 2),
   'F1 Score Parasitized'  : round(f1_score[1], 2),
   'F1 Score Uninfected'   : round(f1_score[0], 2)
}, ignore_index=True)
In [60]:
# Let's look at the performances of models we have created so far

model_performances
Out[60]:
Model Accuracy Precision Parasitized Precision Uninfected Recall Parasitized Recall Uninfected F1 Score Parasitized F1 Score Uninfected
0 Base CNN Model 0.87 0.81 0.95 0.96 0.78 0.88 0.86
1 Using more layers 0.94 0.95 0.93 0.93 0.95 0.94 0.94

Think about it:

Now let's build a model with LeakyRelu as the activation function

  • Can the model performance be improved if we change our activation function to LeakyRelu?
  • Can BatchNormalization improve our model?

Let us try to build a model using BatchNormalization and using LeakyRelu as our activation function.

Model 2 with Batch Normalization

In [61]:
# Clear Keras' backend to clear the history of previous models

keras_backend.clear_session()

# Re-init random numbers
random.seed(9)
np.random.seed(9)
tf.random.set_seed(9)

Building the Model

In [62]:
# Create a more complex model with additional layers including BatchNormalization layers, and use LeakyRelu as the activation function

model_batch_norm = Sequential()
model_batch_norm.add(Conv2D(16, (3, 3), padding="same", input_shape=(64, 64, 3)))
model_batch_norm.add(LeakyReLU(alpha=0.1))
model_batch_norm.add(Conv2D(32, (3, 3), padding="same"))
model_batch_norm.add(LeakyReLU(alpha=0.1))
model_batch_norm.add(MaxPooling2D(pool_size=(2, 2)))
model_batch_norm.add(BatchNormalization())
model_batch_norm.add(Conv2D(32, (3, 3), padding="same"))
model_batch_norm.add(LeakyReLU(alpha=0.1))
model_batch_norm.add(Conv2D(64, (3, 3), padding="same"))
model_batch_norm.add(LeakyReLU(alpha=0.1))
model_batch_norm.add(MaxPooling2D(pool_size=(2, 2)))
model_batch_norm.add(BatchNormalization())
model_batch_norm.add(Flatten())
model_batch_norm.add(Dense(32))
model_batch_norm.add(LeakyReLU(alpha=0.1))
model_batch_norm.add(Dropout(0.5))
model_batch_norm.add(Dense(2, activation='softmax'))

Compiling the model

In [63]:
# Compile the model with BatchNormalization and using LeakyRelu activation

model_batch_norm.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])

Using callbacks

In [64]:
# Let's define a callback to enable early stopping, if no improvement between successive epochs

early_stopping_cb = tf.keras.callbacks.EarlyStopping(
   monitor='val_accuracy', min_delta=0.05, patience=3, verbose=1, restore_best_weights=True)

Fit and train the model

In [65]:
# Train the model with BatchNormalization and LeakyRelu using training dataset and early stopping callback

model_history = model_batch_norm.fit(X_train_normalized, y_train_encoded, validation_split=0.2, batch_size=32, verbose=1,
   epochs=50, callbacks=[early_stopping_cb])
Epoch 1/50
624/624 [==============================] - 9s 10ms/step - loss: 0.4434 - accuracy: 0.8185 - val_loss: 0.0167 - val_accuracy: 0.9968
Epoch 2/50
624/624 [==============================] - 6s 9ms/step - loss: 0.1313 - accuracy: 0.9562 - val_loss: 0.0945 - val_accuracy: 0.9898
Epoch 3/50
624/624 [==============================] - 6s 9ms/step - loss: 0.1053 - accuracy: 0.9676 - val_loss: 0.0999 - val_accuracy: 0.9740
Epoch 4/50
619/624 [============================>.] - ETA: 0s - loss: 0.0831 - accuracy: 0.9720Restoring model weights from the end of the best epoch: 1.
624/624 [==============================] - 6s 9ms/step - loss: 0.0828 - accuracy: 0.9721 - val_loss: 0.1908 - val_accuracy: 0.9459
Epoch 4: early stopping

Evaluating the model

In [66]:
# Let's evaluate the model for accuracy against test dataset

test_accuracy = model_batch_norm.evaluate(X_test_normalized, y_test_encoded, verbose=2)
82/82 - 1s - loss: 0.4548 - accuracy: 0.8719 - 509ms/epoch - 6ms/step

Generate the classification report and confusion matrix

In [67]:
# Predict the labels for the test dataset
y_pred = model_batch_norm.predict(X_test_normalized)        # returns prediction probabilities for each label

# Convert the prediction probablities back to single labels
y_pred_labels = np.argmax(y_pred, axis=1)
y_test_labels = np.argmax(y_test_encoded, axis=1)

metrics = classification_report(y_test_labels, y_pred_labels)
print(metrics)
82/82 [==============================] - 0s 3ms/step
              precision    recall  f1-score   support

           0       0.80      1.00      0.89      1300
           1       0.99      0.75      0.85      1300

    accuracy                           0.87      2600
   macro avg       0.90      0.87      0.87      2600
weighted avg       0.90      0.87      0.87      2600

In [68]:
# Print the Confusion Matrix for test predictions

confusion_matrix = tf.math.confusion_matrix(y_test_labels, y_pred_labels)
_, ax = plt.subplots(figsize=(5, 4))
sns.heatmap(confusion_matrix, annot=True, linewidths=.4, fmt="d", square=True, ax=ax)
plt.show()

Plotting the train and validation accuracy

In [69]:
# Plot the accuracy of training and validation sets for the model with more layers

plotAccuracy(model_history, 'Model with BatchNormalization and LeakyRelu')

Record metrics for later comparison¶

In [70]:
# Collect the performance metrics for a later comparison of various models

precision, recall, f1_score, _ = precision_recall_fscore_support(y_test_labels, y_pred_labels)
accuracy = float(metrics.split()[-2])

model_performances = model_performances.append({
   'Model'                 : 'Using Batch Normalization',
   'Accuracy'              : accuracy,
   'Precision Parasitized' : round(precision[1], 2),
   'Precision Uninfected'  : round(precision[0], 2),
   'Recall Parasitized'    : round(recall[1], 2),
   'Recall Uninfected'     : round(recall[0], 2),
   'F1 Score Parasitized'  : round(f1_score[1], 2),
   'F1 Score Uninfected'   : round(f1_score[0], 2)
}, ignore_index=True)
In [71]:
# print out the performances of models created so far

model_performances
Out[71]:
Model Accuracy Precision Parasitized Precision Uninfected Recall Parasitized Recall Uninfected F1 Score Parasitized F1 Score Uninfected
0 Base CNN Model 0.87 0.81 0.95 0.96 0.78 0.88 0.86
1 Using more layers 0.94 0.95 0.93 0.93 0.95 0.94 0.94
2 Using Batch Normalization 0.87 0.99 0.80 0.75 1.00 0.85 0.89

Observations and insights:

  • Introducing batch normalization and using LeakyRelu for activation has resulted in similar F1 scores with a marginally better accuracy

Model 3 with HSV images¶

We saw earlier that HSV images seemed to visually highlight the patterns better. Let us try using HSV images to train a similar model

In [72]:
# Clear Keras' backend to clear the history of previous models

keras_backend.clear_session()

# Re-init random numbers
random.seed(12)
np.random.seed(12)
tf.random.set_seed(12)

Building the Model¶

In [73]:
# Create a similar model as the last one, with additional layers including BatchNormalization layers, and use LeakyRelu as the activation function

model_batch_norm_hsv = Sequential()
model_batch_norm_hsv.add(Conv2D(16, (3, 3), padding="same", input_shape=(64, 64, 3)))
model_batch_norm_hsv.add(LeakyReLU(alpha=0.1))
model_batch_norm_hsv.add(Conv2D(32, (3, 3), padding="same"))
model_batch_norm_hsv.add(LeakyReLU(alpha=0.1))
model_batch_norm_hsv.add(MaxPooling2D(pool_size=(2, 2)))
model_batch_norm_hsv.add(BatchNormalization())
model_batch_norm_hsv.add(Conv2D(32, (3, 3), padding="same"))
model_batch_norm_hsv.add(LeakyReLU(alpha=0.1))
model_batch_norm_hsv.add(Conv2D(64, (3, 3), padding="same"))
model_batch_norm_hsv.add(LeakyReLU(alpha=0.1))
model_batch_norm_hsv.add(MaxPooling2D(pool_size=(2, 2)))
model_batch_norm_hsv.add(BatchNormalization())
model_batch_norm_hsv.add(Flatten())
model_batch_norm_hsv.add(Dense(32))
model_batch_norm_hsv.add(LeakyReLU(alpha=0.1))
model_batch_norm_hsv.add(Dropout(0.5))
model_batch_norm_hsv.add(Dense(2, activation='softmax'))

Compiling the model¶

In [74]:
# Compile the model with BatchNormalization and using LeakyRelu activation

model_batch_norm_hsv.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])

Fit and train the model¶

In [75]:
# Train the model with BatchNormalization and LeakyRelu using training dataset and early stopping callback

model_history = model_batch_norm_hsv.fit(X_train_normalized_hsv, y_train_encoded, validation_split=0.2, batch_size=32, verbose=1,
   epochs=50, callbacks=[early_stopping_cb])
Epoch 1/50
624/624 [==============================] - 9s 10ms/step - loss: 0.3870 - accuracy: 0.8598 - val_loss: 0.3046 - val_accuracy: 0.9573
Epoch 2/50
624/624 [==============================] - 6s 9ms/step - loss: 0.1371 - accuracy: 0.9577 - val_loss: 0.0241 - val_accuracy: 0.9944
Epoch 3/50
624/624 [==============================] - 6s 9ms/step - loss: 0.1168 - accuracy: 0.9645 - val_loss: 0.2881 - val_accuracy: 0.9513
Epoch 4/50
623/624 [============================>.] - ETA: 0s - loss: 0.0915 - accuracy: 0.9725Restoring model weights from the end of the best epoch: 1.
624/624 [==============================] - 6s 9ms/step - loss: 0.0914 - accuracy: 0.9725 - val_loss: 0.3137 - val_accuracy: 0.9361
Epoch 4: early stopping

Evaluating the model¶

In [76]:
# Let's evaluate the model for accuracy against test dataset

test_accuracy = model_batch_norm_hsv.evaluate(X_test_normalized_hsv, y_test_encoded, verbose=2)
82/82 - 0s - loss: 0.2903 - accuracy: 0.9408 - 457ms/epoch - 6ms/step

Generate the classification report and confusion matrix¶

In [77]:
# Predict the labels for the test dataset
y_pred = model_batch_norm_hsv.predict(X_test_normalized_hsv)        # returns prediction probabilities for each label

# Convert the prediction probablities back to single labels
y_pred_labels = np.argmax(y_pred, axis=1)
y_test_labels = np.argmax(y_test_encoded, axis=1)

metrics = classification_report(y_test_labels, y_pred_labels)
print(metrics)
82/82 [==============================] - 0s 3ms/step
              precision    recall  f1-score   support

           0       0.94      0.94      0.94      1300
           1       0.94      0.94      0.94      1300

    accuracy                           0.94      2600
   macro avg       0.94      0.94      0.94      2600
weighted avg       0.94      0.94      0.94      2600

In [78]:
# Print the Confusion Matrix for test predictions

confusion_matrix = tf.math.confusion_matrix(y_test_labels, y_pred_labels)
_, ax = plt.subplots(figsize=(5, 4))
sns.heatmap(confusion_matrix, annot=True, linewidths=.4, fmt="d", square=True, ax=ax)
plt.show()

Plotting the train and validation accuracy¶

In [79]:
# Plot the accuracy of training and validation sets for the model with more layers

plotAccuracy(model_history, 'Model with BatchNormalization and LeakyRelu that uses HSV images')

Record metrics for later comparison¶

In [80]:
# Collect the performance metrics for a later comparison of various models

precision, recall, f1_score, _ = precision_recall_fscore_support(y_test_labels, y_pred_labels)
accuracy = float(metrics.split()[-2])

model_performances = model_performances.append({
   'Model'                 : 'Using HSV images',
   'Accuracy'              : accuracy,
   'Precision Parasitized' : round(precision[1], 2),
   'Precision Uninfected'  : round(precision[0], 2),
   'Recall Parasitized'    : round(recall[1], 2),
   'Recall Uninfected'     : round(recall[0], 2),
   'F1 Score Parasitized'  : round(f1_score[1], 2),
   'F1 Score Uninfected'   : round(f1_score[0], 2)
}, ignore_index=True)
In [81]:
# print out the performances of models created so far

model_performances
Out[81]:
Model Accuracy Precision Parasitized Precision Uninfected Recall Parasitized Recall Uninfected F1 Score Parasitized F1 Score Uninfected
0 Base CNN Model 0.87 0.81 0.95 0.96 0.78 0.88 0.86
1 Using more layers 0.94 0.95 0.93 0.93 0.95 0.94 0.94
2 Using Batch Normalization 0.87 0.99 0.80 0.75 1.00 0.85 0.89
3 Using HSV images 0.94 0.94 0.94 0.94 0.94 0.94 0.94

Observations and insights:¶

  • The HSV images certainly seems to have improved the accuracy of the model.
  • Let us experiment with other options, and we can revisit this HSV images again

Think About It :

  • Can we improve the model with Image Data Augmentation?
  • References to image data augmentation can be seen below:
    • Image Augmentation for Computer Vision
    • How to Configure Image Data Augmentation in Keras?

Model 4 with Data Augmentation

In [82]:
# Clear Keras' backend to clear the history of previous models

keras_backend.clear_session()

# Re-init random numbers
random.seed(15)
np.random.seed(15)
tf.random.set_seed(15)

Use image data generator

In [83]:
# Create the image data generator using Keras

from keras.preprocessing.image import ImageDataGenerator

data_generator = ImageDataGenerator(rotation_range=45,        # rotate images randomly upto 45 degrees
                                    zoom_range=0.2,           # zoom the image upto 20%
                                    horizontal_flip=True,     # flip the image horizontally
                                    vertical_flip=True,       #      and verticaly, randomly
                                    fill_mode='nearest')

Think About It :

  • Check if the performance of the model can be improved by changing different parameters in the ImageDataGenerator.
    • Tried few variations, since our images do not have strong edges, the difference between variations are not much
    • Combining few transformations into a single pass

Visualizing Augmented images

In [84]:
#  Visualize a sampling of 18 augmented images

aug_images = []

data_iterator = data_generator.flow(X_train_normalized, y_train_encoded, batch_size=1)

for i in range(18):
   aug_image = data_iterator.next()[0][0, :]
   aug_images.append(aug_image)

aug_images = np.array(aug_images)

fig = plt.figure(figsize=(12, 6))                           # Defining the figure size to 12x6

rows = 3                                                    # number of rows in the plot
cols = 6                                                    # number of columns in the plot

for i in range(18):											# Iterate through rows first
   ax = fig.add_subplot(rows, cols, i+1)                    # Adding subplot
   ax.imshow(aug_images[i, :], cmap=plt.get_cmap('gray'))   # Plotting the image using cmap=gray
   plt.xticks([])											# Remove un-necessary ticks on the X axis
   plt.yticks([])											# Remove un-necessary ticks on the y axis
plt.show()

Observations and insights:

  • The augmented images are very similar to original but with some noise added which could help to generalize the model better

Building the Model

In [85]:
# We can use the same architecture as the last model we built to use with augmented training images

model_augmented = Sequential()
model_augmented.add(Conv2D(16, (3, 3), padding="same", input_shape=(64, 64, 3)))
model_augmented.add(LeakyReLU(alpha=0.1))
model_augmented.add(Conv2D(32, (3, 3), padding="same"))
model_augmented.add(LeakyReLU(alpha=0.1))
model_augmented.add(MaxPooling2D(pool_size=(2, 2)))
model_augmented.add(BatchNormalization())
model_augmented.add(Conv2D(32, (3, 3), padding="same"))
model_augmented.add(LeakyReLU(alpha=0.1))
model_augmented.add(Conv2D(64, (3, 3), padding="same"))
model_augmented.add(LeakyReLU(alpha=0.1))
model_augmented.add(MaxPooling2D(pool_size=(2, 2)))
model_augmented.add(BatchNormalization())
model_augmented.add(Flatten())
model_augmented.add(Dense(32))
model_augmented.add(LeakyReLU(alpha=0.1))
model_augmented.add(Dropout(0.5))
model_augmented.add(Dense(2, activation='softmax'))

model_augmented.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])

Using Callbacks

In [86]:
early_stopping_cb = tf.keras.callbacks.EarlyStopping(
   monitor='accuracy', min_delta=0.1, patience=3, verbose=1, restore_best_weights=True)

Fit and Train the model

In [87]:
# Prep the image iterators

# Generate augmented images from in-memory images of training dataset
data_iterator = data_generator.flow(X_train_normalized, y_train_encoded, batch_size=32)

# Generate validation images from test dataset.
# We are not applying any transformation to these validation images, so that we can validate against real data
val_generator = ImageDataGenerator()
val_iterator = val_generator.flow(X_test_normalized, y_test_encoded, batch_size=32)
In [88]:
# Train the model using prepared iterators that use training and test images

model_history = model_augmented.fit(data_iterator, steps_per_epoch=len(X_train_normalized) // 32,
   epochs=50, callbacks=[early_stopping_cb],
   validation_data = val_iterator,
   validation_steps=X_test_normalized.shape[0] // 32
   )
Epoch 1/50
779/779 [==============================] - 29s 34ms/step - loss: 0.4903 - accuracy: 0.8198 - val_loss: 0.1380 - val_accuracy: 0.9537
Epoch 2/50
779/779 [==============================] - 26s 33ms/step - loss: 0.1632 - accuracy: 0.9518 - val_loss: 0.0718 - val_accuracy: 0.9826
Epoch 3/50
779/779 [==============================] - 26s 33ms/step - loss: 0.1289 - accuracy: 0.9620 - val_loss: 0.2518 - val_accuracy: 0.9375
Epoch 4/50
779/779 [==============================] - 26s 33ms/step - loss: 0.1151 - accuracy: 0.9639 - val_loss: 0.0871 - val_accuracy: 0.9730
Epoch 5/50
779/779 [==============================] - ETA: 0s - loss: 0.1085 - accuracy: 0.9667Restoring model weights from the end of the best epoch: 2.
779/779 [==============================] - 26s 33ms/step - loss: 0.1085 - accuracy: 0.9667 - val_loss: 0.0526 - val_accuracy: 0.9819
Epoch 5: early stopping

Evaluating the model

In [89]:
# Test the accuracy against test data

test_accuracy = model_augmented.evaluate(X_test_normalized, y_test_encoded, verbose=2)
82/82 - 0s - loss: 0.0717 - accuracy: 0.9827 - 344ms/epoch - 4ms/step

Plot the train and validation accuracy

In [90]:
# Plot the accuracy of training and validation sets for the model that used augmented images

plotAccuracy(model_history, 'Model using Augmented images')

Plotting the classification report and confusion matrix

In [91]:
# Predict the labels for the test dataset
y_pred = model_augmented.predict(X_test_normalized)        # returns prediction probabilities for each label

# Convert the prediction probablities back to single labels
y_pred_labels = np.argmax(y_pred, axis=1)
y_test_labels = np.argmax(y_test_encoded, axis=1)

metrics = classification_report(y_test_labels, y_pred_labels)
print(metrics)
82/82 [==============================] - 0s 3ms/step
              precision    recall  f1-score   support

           0       0.98      0.98      0.98      1300
           1       0.98      0.98      0.98      1300

    accuracy                           0.98      2600
   macro avg       0.98      0.98      0.98      2600
weighted avg       0.98      0.98      0.98      2600

In [92]:
# Print the Confusion Matrix for test predictions

confusion_matrix = tf.math.confusion_matrix(y_test_labels, y_pred_labels)
_, ax = plt.subplots(figsize=(5, 4))
sns.heatmap(confusion_matrix, annot=True, linewidths=.4, fmt="d", square=True, ax=ax)
Out[92]:
<Axes: >

Record metrics for later comparison¶

In [93]:
# Collect the performance metrics for a later comparison of various models

precision, recall, f1_score, _ = precision_recall_fscore_support(y_test_labels, y_pred_labels)
accuracy = float(metrics.split()[-2])

model_performances = model_performances.append({
   'Model'                 : 'Using augmented images',
   'Accuracy'              : accuracy,
   'Precision Parasitized' : round(precision[1], 2),
   'Precision Uninfected'  : round(precision[0], 2),
   'Recall Parasitized'    : round(recall[1], 2),
   'Recall Uninfected'     : round(recall[0], 2),
   'F1 Score Parasitized'  : round(f1_score[1], 2),
   'F1 Score Uninfected'   : round(f1_score[0], 2)
}, ignore_index=True)
In [94]:
# print out the performances of models created so far

model_performances
Out[94]:
Model Accuracy Precision Parasitized Precision Uninfected Recall Parasitized Recall Uninfected F1 Score Parasitized F1 Score Uninfected
0 Base CNN Model 0.87 0.81 0.95 0.96 0.78 0.88 0.86
1 Using more layers 0.94 0.95 0.93 0.93 0.95 0.94 0.94
2 Using Batch Normalization 0.87 0.99 0.80 0.75 1.00 0.85 0.89
3 Using HSV images 0.94 0.94 0.94 0.94 0.94 0.94 0.94
4 Using augmented images 0.98 0.98 0.98 0.98 0.98 0.98 0.98

Now, let us try to use a pretrained model like VGG16 and check how it performs on our data.

Model 5 with Original + Augmented Data¶

The previous model used only the augmented images for training and validation. Let us use both the original images and augmented images, as it might perform even better.

In [95]:
# Clear Keras' backend to clear the history of previous models

keras_backend.clear_session()

# Re-init random numbers
random.seed(18)
np.random.seed(18)
tf.random.set_seed(18)

Use combined image data generators¶

In [96]:
# Let's use both original dataset along with the augmented images generated

# we already have a ImageDataGenerator for augmented images, we will re-use it here

# Create a genearotr for original images
original_generator = ImageDataGenerator()

# collect flow'ed images from both original and augmented/generated images
X_train_original = original_generator.flow(X_train_normalized, y_train_encoded, batch_size=32)
X_train_augmented = data_generator.flow(X_train_normalized, y_train_encoded, batch_size=32)

# concatenate the generaotrs
def CombineGenerators(gen1, gen2):
    while True:
        x1, y1 = gen1.next()
        x2, y2 = gen2.next()
        yield (np.concatenate((x1, x2), axis=0), np.concatenate((y1, y2), axis=0))

X_train_combined = CombineGenerators(X_train_original, X_train_augmented)

Building the Model¶

In [97]:
# We can use the same architecture as the last model we built to use with augmented training images

model_orig_aug = Sequential()
model_orig_aug.add(Conv2D(16, (3, 3), padding="same", input_shape=(64, 64, 3)))
model_orig_aug.add(LeakyReLU(alpha=0.1))
model_orig_aug.add(Conv2D(32, (3, 3), padding="same"))
model_orig_aug.add(LeakyReLU(alpha=0.1))
model_orig_aug.add(MaxPooling2D(pool_size=(2, 2)))
model_orig_aug.add(BatchNormalization())
model_orig_aug.add(Conv2D(32, (3, 3), padding="same"))
model_orig_aug.add(LeakyReLU(alpha=0.1))
model_orig_aug.add(Conv2D(64, (3, 3), padding="same"))
model_orig_aug.add(LeakyReLU(alpha=0.1))
model_orig_aug.add(MaxPooling2D(pool_size=(2, 2)))
model_orig_aug.add(BatchNormalization())
model_orig_aug.add(Flatten())
model_orig_aug.add(Dense(32))
model_orig_aug.add(LeakyReLU(alpha=0.1))
model_orig_aug.add(Dropout(0.5))
model_orig_aug.add(Dense(2, activation='softmax'))

model_orig_aug.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])

Fit and Train the model¶

In [98]:
# Train the model using prepared iterators that use training and test images

model_history = model_orig_aug.fit(X_train_combined,
   steps_per_epoch=len(X_train_normalized) * 2 // 32,
   epochs=50, callbacks=[early_stopping_cb],
   validation_data = val_iterator,
   validation_steps=X_test_normalized.shape[0] // 32
   )
Epoch 1/50
1559/1559 [==============================] - 59s 36ms/step - loss: 0.2557 - accuracy: 0.9064 - val_loss: 0.0605 - val_accuracy: 0.9865
Epoch 2/50
1559/1559 [==============================] - 55s 35ms/step - loss: 0.0902 - accuracy: 0.9714 - val_loss: 0.0673 - val_accuracy: 0.9807
Epoch 3/50
1559/1559 [==============================] - 55s 35ms/step - loss: 0.0797 - accuracy: 0.9746 - val_loss: 0.0467 - val_accuracy: 0.9869
Epoch 4/50
1559/1559 [==============================] - ETA: 0s - loss: 0.0755 - accuracy: 0.9756Restoring model weights from the end of the best epoch: 1.
1559/1559 [==============================] - 55s 35ms/step - loss: 0.0755 - accuracy: 0.9756 - val_loss: 0.0467 - val_accuracy: 0.9846
Epoch 4: early stopping

Evaluating the Model¶

In [99]:
# Test the accuracy against test data

test_accuracy = model_orig_aug.evaluate(X_test_normalized, y_test_encoded, verbose=2)
82/82 - 0s - loss: 0.0603 - accuracy: 0.9865 - 330ms/epoch - 4ms/step

Plot the train and validation accuracy¶

In [100]:
# Plot the accuracy of training and validation sets for the model that used augmented images

plotAccuracy(model_history, 'Model using Origianal + Augmented images')

Plotting the classification report and confusion matrix¶

In [101]:
# Predict the labels for the test dataset
y_pred = model_orig_aug.predict(X_test_normalized)        # returns prediction probabilities for each label

# Convert the prediction probablities back to single labels
y_pred_labels = np.argmax(y_pred, axis=1)
y_test_labels = np.argmax(y_test_encoded, axis=1)

metrics = classification_report(y_test_labels, y_pred_labels)
print(metrics)
82/82 [==============================] - 0s 3ms/step
              precision    recall  f1-score   support

           0       0.98      0.99      0.99      1300
           1       0.99      0.98      0.99      1300

    accuracy                           0.99      2600
   macro avg       0.99      0.99      0.99      2600
weighted avg       0.99      0.99      0.99      2600

In [102]:
# Print the Confusion Matrix for test predictions

confusion_matrix = tf.math.confusion_matrix(y_test_labels, y_pred_labels)
_, ax = plt.subplots(figsize=(5, 4))
sns.heatmap(confusion_matrix, annot=True, linewidths=.4, fmt="d", square=True, ax=ax)
Out[102]:
<Axes: >

Record metrics for later comparison¶

In [103]:
# Collect the performance metrics for a later comparison of various models

precision, recall, f1_score, _ = precision_recall_fscore_support(y_test_labels, y_pred_labels)
accuracy = float(metrics.split()[-2])

model_performances = model_performances.append({
   'Model'                 : 'Using original AND augmented images',
   'Accuracy'              : accuracy,
   'Precision Parasitized' : round(precision[1], 2),
   'Precision Uninfected'  : round(precision[0], 2),
   'Recall Parasitized'    : round(recall[1], 2),
   'Recall Uninfected'     : round(recall[0], 2),
   'F1 Score Parasitized'  : round(f1_score[1], 2),
   'F1 Score Uninfected'   : round(f1_score[0], 2)
}, ignore_index=True)
In [104]:
# print out the performances of models created so far

model_performances
Out[104]:
Model Accuracy Precision Parasitized Precision Uninfected Recall Parasitized Recall Uninfected F1 Score Parasitized F1 Score Uninfected
0 Base CNN Model 0.87 0.81 0.95 0.96 0.78 0.88 0.86
1 Using more layers 0.94 0.95 0.93 0.93 0.95 0.94 0.94
2 Using Batch Normalization 0.87 0.99 0.80 0.75 1.00 0.85 0.89
3 Using HSV images 0.94 0.94 0.94 0.94 0.94 0.94 0.94
4 Using augmented images 0.98 0.98 0.98 0.98 0.98 0.98 0.98
5 Using original AND augmented images 0.99 0.99 0.98 0.98 0.99 0.99 0.99

Pre-trained model (VGG16)¶

  • Import VGG16 network upto any layer you choose
  • Add Fully Connected Layers on top of it
In [105]:
# Clear Keras' backend to clear the history of previous models

keras_backend.clear_session()

# Re-init random numbers
random.seed(21)
np.random.seed(21)
tf.random.set_seed(21)
In [106]:
#  Use the pre-trained model VGG16 without the top layers

from keras.applications import VGG16

vgg_pretrained = VGG16(weights='imagenet',          # use weights that are pre-trained on ImageNet
                       include_top=False,           # do not include 3 fully connected layers at the top of network
                       input_shape=(64, 64, 3))     # image shape for the input layer

for layer in vgg_pretrained.layers:
   layer.trainable = False                          # no need to train the pre-trained layers

model_vgg_based = Sequential()                      # this is our model that will make use of pre-trained VGG16
model_vgg_based.add(vgg_pretrained)                 # add the pre-trained VGG16 as a layer to our model

# Add our own fully connected layers
model_vgg_based.add(Flatten())
model_vgg_based.add(Dense(32))
model_vgg_based.add(LeakyReLU(alpha=0.1))
model_vgg_based.add(Dense(2, activation='softmax'))

Compiling the model

In [107]:
# Compile our model that uses transfer learning

model_vgg_based.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])

using callbacks

In [108]:
# Let's define a callback to enable early stopping, if no improvement between successive epochs

early_stopping_cb = tf.keras.callbacks.EarlyStopping(
   monitor='val_accuracy', min_delta=0.05, patience=3, verbose=1, restore_best_weights=True)

Fit and Train the model

In [109]:
# Train the model that uses learnings from VGG16

model_history = model_vgg_based.fit(X_train_normalized, y_train_encoded, validation_split=0.2, batch_size=32,
   verbose=1, epochs=50, callbacks=[early_stopping_cb])
Epoch 1/50
624/624 [==============================] - 14s 20ms/step - loss: 0.1869 - accuracy: 0.9253 - val_loss: 0.3552 - val_accuracy: 0.8644
Epoch 2/50
624/624 [==============================] - 11s 17ms/step - loss: 0.1362 - accuracy: 0.9495 - val_loss: 0.2502 - val_accuracy: 0.9004
Epoch 3/50
624/624 [==============================] - 11s 18ms/step - loss: 0.1299 - accuracy: 0.9495 - val_loss: 0.2677 - val_accuracy: 0.8972
Epoch 4/50
621/624 [============================>.] - ETA: 0s - loss: 0.1234 - accuracy: 0.9532Restoring model weights from the end of the best epoch: 1.
624/624 [==============================] - 11s 18ms/step - loss: 0.1233 - accuracy: 0.9533 - val_loss: 0.3400 - val_accuracy: 0.8590
Epoch 4: early stopping

Evaluating the model¶

In [110]:
# Test the accuracy against test data

test_accuracy = model_vgg_based.evaluate(X_test_normalized, y_test_encoded, verbose=2)
82/82 - 2s - loss: 0.2535 - accuracy: 0.9008 - 2s/epoch - 20ms/step

Plot the train and validation accuracy

In [111]:
# Plot the accuracy of training and validation sets for the model that used augmented images

plotAccuracy(model_history, 'Model using VGG16 transfer learning')

Plotting the classification report and confusion matrix¶

In [112]:
# Predict the labels for the test dataset
y_pred = model_vgg_based.predict(X_test_normalized)        # returns prediction probabilities for each label

# Convert the prediction probablities back to single labels
y_pred_labels = np.argmax(y_pred, axis=1)
y_test_labels = np.argmax(y_test_encoded, axis=1)

metrics = classification_report(y_test_labels, y_pred_labels)
print(metrics)
82/82 [==============================] - 1s 13ms/step
              precision    recall  f1-score   support

           0       0.98      0.82      0.89      1300
           1       0.84      0.98      0.91      1300

    accuracy                           0.90      2600
   macro avg       0.91      0.90      0.90      2600
weighted avg       0.91      0.90      0.90      2600

In [113]:
# Print the Confusion Matrix for test predictions

confusion_matrix = tf.math.confusion_matrix(y_test_labels, y_pred_labels)
_, ax = plt.subplots(figsize=(5, 4))
sns.heatmap(confusion_matrix, annot=True, linewidths=.4, fmt="d", square=True, ax=ax)
plt.show()

Record metrics for later comparison¶

In [114]:
# Collect the performance metrics for a later comparison of various models

precision, recall, f1_score, _ = precision_recall_fscore_support(y_test_labels, y_pred_labels)
accuracy = float(metrics.split()[-2])

model_performances = model_performances.append({
   'Model'                 : 'Transfer from VGG16',
   'Accuracy'              : accuracy,
   'Precision Parasitized' : round(precision[1], 2),
   'Precision Uninfected'  : round(precision[0], 2),
   'Recall Parasitized'    : round(recall[1], 2),
   'Recall Uninfected'     : round(recall[0], 2),
   'F1 Score Parasitized'  : round(f1_score[1], 2),
   'F1 Score Uninfected'   : round(f1_score[0], 2)
}, ignore_index=True)
In [115]:
# print out the performances of models created so far

model_performances
Out[115]:
Model Accuracy Precision Parasitized Precision Uninfected Recall Parasitized Recall Uninfected F1 Score Parasitized F1 Score Uninfected
0 Base CNN Model 0.87 0.81 0.95 0.96 0.78 0.88 0.86
1 Using more layers 0.94 0.95 0.93 0.93 0.95 0.94 0.94
2 Using Batch Normalization 0.87 0.99 0.80 0.75 1.00 0.85 0.89
3 Using HSV images 0.94 0.94 0.94 0.94 0.94 0.94 0.94
4 Using augmented images 0.98 0.98 0.98 0.98 0.98 0.98 0.98
5 Using original AND augmented images 0.99 0.99 0.98 0.98 0.99 0.99 0.99
6 Transfer from VGG16 0.90 0.84 0.98 0.98 0.82 0.91 0.89

Observations and insights:¶

  • What can be observed from the validation and train curves?
    • Training set's performance seems steadily progressing after each epch
    • However, the accuracy of validation set seems fluctuating, compared to the training set's performance, but that may help the model to generalize better

Compare the performance of various models that we built¶

In [116]:
# Plot the performance with all metrics, for all models we built, so that we can compare them visually

perf_comparison = model_performances.copy(deep=True)            # make a copy, so that original dataframe can be left intact

perf_comparison.set_index('Model', inplace=True)
perf_comparison.plot(kind='bar', figsize=(10, 5))
plt.title('Compare performance of models we built')
plt.xlabel('Model')
plt.ylabel('Score')
plt.legend(bbox_to_anchor=(1.30, 1.0), loc='upper right')
plt.show()

Solution Summary¶

Think about it:

  • What observations and insights can be drawn from the confusion matrix and classification report?
  • Choose the model with the best accuracy scores from all the above models and save it as a final model.

Observations and insights¶

  • Comparing the performance (using accuracy, and other metrics as plotted above) of all models we have built, it is clear that models 4 and 5 that use augmented images have performed better than others
  • Since model 5 that uses both original images and augmented images has the best accuracy, and same or better precision and recall scores than 4, we will choose Model 5 as our Final Model
In [117]:
# Choose final model to recommend

final_model = model_orig_aug

Observations and Conclusions drawn from the final model:¶

  • Model 5 with Original + Augmented Data is the best performing model among those we have built
  • This model has an accuracy of 99%, making it the best among other models
  • When you compare other metrics like precision, recall and F1 score, our final model has been consistent and has better metrics in all fronts
  • So, we will recommend our model_orig_aug as our best and final model to deploy to real world

Recommendations for Implementation ¶

Recommendations

  • Our recommendation is to deploy our final model to the world geographies ASAP as it can help to prevent hospitalization or even death of infected people
  • We also recommend the stake holders to continue this study further to fine-tune our final model or to come up with a better performing model
  • We also recommend the stake holders to invest in collecting additional samples on an on-going basis from different parts of the world to enrich the samples dataset, and to re-train our model on the new samples that would make the model more efficient
  • On a periodic basis, plan for rolling out of updated models whenvever we have attained models with significant performance improvement

Cost vs Benefits

  • The cost of devloping and deploying these models are very insignificant when compared to the potential benefit of saving hunderds of thousands of human lives.

Risks and Challenges

  • A possible risk is that the model mis-diagnosing an infected person as non-infected, which has the potential of losing a human life. Fortunately our final model's recall score is very high that the benefits out-weigh this insignificant risk
  • We need to keep re-calibrating the model so that they can keep learning any variations that my be developing

Improvements that can be done:

  • Can the model performance be improved using other pre-trained models or different CNN architecture?

    • There is always room for improvement. We can try introducing combinations of convolutional, max pooling, bath normalization, dropout or dense layers in our model. We can also use VGG19 or ResNet models to transfer their learning into our models.
    • All these variations need to be trained, evaluated and compared for their performance to identify a better performing model than our final model
  • We can build a variation of Model 5 that uses HSV images to generate augmented data and compare that with other models
    • As we noted earlier, the HSV images seemed to highlight the spots in parasitized cells better and we were able to create a better performing model with them
    • Our final model uses augmented images and has proved that has the best accuracy
    • Combining these two, we can use HSV image to generate augmented images and then we can use those augmented images along with HSV images themselves to train the new model
    • This is certainly one of the natural next steps to improve our final model further
  • While this model is in production, we can continue to collect additional samples that can be fed into a continuous learning loop to train on those new samples. This also can improve the accuracy of our final model.

Insights¶

Refined insights:¶

  • What are the most meaningful insights from the data relevant to the problem?

    • Our dataset is well balanced between parasitized and uninfected samples, hence it does not introduce any data bias
    • On visual inspection of samples, there does not seem to be any complex patterns involved. All parasitized images have spots with different color and intensity
    • Converting the RGB images to HSV color space seem to contrast the spots better
    • Gaussian blurring does not seem to help us here as we do not have any sharp edges in our samples

Comparison of various techniques and their relative performance:¶

  • How do different techniques perform? Which one is performing relatively better? Is there scope to improve the performance further?
    • All of our models performed reasonably well, but with various levels of accuracy
    • Augmenting images seems to be a powerful technique, as both model 4 and 5 which use augmented images have the top performance
    • Still, there definitely is scope for improvement, as detailed in the section above

Proposal for the final solution design:¶

  • What model do you propose to be adopted? Why is this the best solution to adopt?
    • The Model 5 with Original + Augmented Data certainly has outperformed other models we built, and seems to be generalizing very well compared to other models
    • There is a potential to improve this model further, but that will take additional time and resources.
    • Time to marke is of essence here. Considering the dire situation to identify malaria infected patients early on, in order to protect them and reduce the number of deaths due to malaria around the world, sooner we deploy a good solution it would be better
    • While we may be able to still fine-tune our model with additional activities identified above, our final model is good enough to save lives
    • So, our recommendation is:
      • Deploy our final model now
      • Continue to invest in fine-tuning the model
      • Rollout the updated model when we have significant improvement